Turbo similarity searching: Effect of fingerprint and dataset on virtual-screening performance
نویسندگان
چکیده
Turbo similarity searching uses information about the nearest neighbours in a conventional chemical similarity search to increase the effectiveness of virtual screening, with a data fusion approach being used to combine the nearest-neighbour information. A previous paper suggested that the approach was highly effective in operation; this paper further tests the approach using a range of different databases and of structural representations. Searches were carried out on three different databases of chemical structures, using seven different types of fingerprint, as well as molecular holograms, physicochemical properties, topological indices and reduced graphs. The results show that turbo similarity searching can indeed enhance retrieval but that this is normally achieved only if the similarity search that acts as its starting point has already achieved at least some reasonable level of search effectiveness. In other cases, a modified version of TSS that uses the nearest-neighbour information for approximate machine learning can be used effectively. Whilst useful for qualitative (active/inactive) predictions of biological activity, turbo similarity searching does not appear to exhibit any predictive power when quantitative property data is available.
منابع مشابه
Similarity-based virtual screening using 2D fingerprints.
This paper summarizes recent work at the University of Sheffield on virtual screening methods that use 2D fingerprint measures of structural similarity. A detailed comparison of a large number of similarity coefficients demonstrates that the well-known Tanimoto coefficient remains the method of choice for the computation of fingerprint-based similarity, despite possessing some inherent biases r...
متن کاملStatistical modeling of value distributions of similarity coefficients in virtual screening and its application to predicting fingerprint search performance
Similarity searching using fingerprints is a popular ligandbased virtual screening approach. The Tanimoto coefficient (Tc) is the most widely used measure for quantifying fingerprint similarity. In general, it is very difficult to assess the significance of the similarity of two molecules solely based on their calculated Tc values. In the literature, Tc cut-off values are frequently intuitively...
متن کاملBayesian Inference Network for Molecular Similarity Searching Using 2d Fingerprints and Multiple Reference Structures
2D fingerprint based similarity searching using a single bioactive reference is the most popular and effective virtual screening tool. In our last paper, we have introduced a novel method for similarity searching using Bayesian inference network (BIN). In this study, we have compared BIN with other similarity searching methods when multiple bioactive reference molecules are available. Three dif...
متن کاملMaximum Common Substructure-Based Data Fusion in Similarity Searching
Data fusion has been shown to work very well when applied to fingerprint-based similarity searching, yet little is known of its application to maximum common substructure (MCS)-based similarity searching. Two similarity search applications of the MCS will be focused on here. Typically, the number of bonds in the MCS, as well as the bonds in the two molecules being compared, are used in a simila...
متن کاملAnalysis and comparison of 2D fingerprints: insights into database screening performance using eight fingerprint methods
Virtual screening is a widely used strategy in modern drug discovery and 2D fingerprint similarity is an important tool that has been successfully applied to retrieve active compounds from large datasets. However, it is not always straightforward to select an appropriate fingerprint method and associated settings for a given problem. Here, we applied eight different fingerprint methods, as impl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Statistical Analysis and Data Mining
دوره 2 شماره
صفحات -
تاریخ انتشار 2009